Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 5477006 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 1410 |
| Duplicate rows (%) | < 0.1% |
| Total size in memory | 1.1 GiB |
| Average record size in memory | 220.0 B |
Variable types
| Numeric | 10 |
|---|---|
| Categorical | 3 |
| Dataset has 1410 (< 0.1%) duplicate rows | Duplicates |
date has a high cardinality: 1075 distinct values | High cardinality |
time has a high cardinality: 86400 distinct values | High cardinality |
geo_lon is highly correlated with region | High correlation |
region is highly correlated with geo_lon | High correlation |
level is highly correlated with levels | High correlation |
levels is highly correlated with level | High correlation |
rooms is highly correlated with area | High correlation |
area is highly correlated with rooms | High correlation |
price is highly correlated with area | High correlation |
geo_lon is highly correlated with region | High correlation |
region is highly correlated with geo_lon | High correlation |
level is highly correlated with levels | High correlation |
levels is highly correlated with level | High correlation |
rooms is highly correlated with area | High correlation |
area is highly correlated with price and 2 other fields | High correlation |
kitchen_area is highly correlated with area | High correlation |
geo_lon is highly correlated with region | High correlation |
region is highly correlated with geo_lon | High correlation |
rooms is highly correlated with area | High correlation |
area is highly correlated with rooms | High correlation |
geo_lat is highly correlated with geo_lon | High correlation |
geo_lon is highly correlated with geo_lat and 1 other fields | High correlation |
region is highly correlated with geo_lon | High correlation |
level is highly correlated with levels | High correlation |
levels is highly correlated with level and 1 other fields | High correlation |
building_type is highly correlated with levels | High correlation |
area is highly skewed (γ1 = 57.05613875) | Skewed |
kitchen_area is highly skewed (γ1 = 452.5307552) | Skewed |
building_type has 307165 (5.6%) zeros | Zeros |
Reproduction
| Analysis started | 2021-08-07 18:20:20.529571 |
|---|---|
| Analysis finished | 2021-08-07 18:25:56.802176 |
| Duration | 5 minutes and 36.27 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 352726 |
|---|---|
| Distinct (%) | 6.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4422029.023 |
| Minimum | -2144967296 |
|---|---|
| Maximum | 2147483647 |
| Zeros | 23 |
| Zeros (%) | < 0.1% |
| Negative | 365 |
| Negative (%) | < 0.1% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | -2144967296 |
|---|---|
| 5-th percentile | 1150000 |
| Q1 | 1950000 |
| median | 2990000 |
| Q3 | 4802000 |
| 95-th percentile | 11395000 |
| Maximum | 2147483647 |
| Range | 4292450943 |
| Interquartile range (IQR) | 2852000 |
Descriptive statistics
| Standard deviation | 21507519.15 |
|---|---|
| Coefficient of variation (CV) | 4.863721844 |
| Kurtosis | 6278.420808 |
| Mean | 4422029.023 |
| Median Absolute Deviation (MAD) | 1242200 |
| Skewness | -14.22845762 |
| Sum | 2.421947949 × 1013 |
| Variance | 4.625733802 × 1014 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 2500000 | 65745 | 1.2% |
| 2300000 | 59842 | 1.1% |
| 3500000 | 57676 | 1.1% |
| 2200000 | 57505 | 1.0% |
| 2100000 | 56654 | 1.0% |
| 1650000 | 54242 | 1.0% |
| 1800000 | 53363 | 1.0% |
| 2600000 | 53131 | 1.0% |
| 2400000 | 51834 | 0.9% |
| 1750000 | 51529 | 0.9% |
| Other values (352716) | 4915485 |
| Value | Count | Frequency (%) |
| -2144967296 | 2 | < 0.1% |
| -2114967296 | 2 | < 0.1% |
| -2114150296 | 159 | |
| -2094967296 | 4 | < 0.1% |
| -2089967296 | 1 | < 0.1% |
| -2053850296 | 13 | < 0.1% |
| -2041757296 | 4 | < 0.1% |
| -2040742296 | 8 | < 0.1% |
| -1964967296 | 1 | < 0.1% |
| -1944967296 | 6 | < 0.1% |
| Value | Count | Frequency (%) |
| 2147483647 | 2 | < 0.1% |
| 2089477704 | 2 | < 0.1% |
| 2083290000 | 42 | |
| 2050000000 | 4 | < 0.1% |
| 2000000300 | 1 | < 0.1% |
| 2000000000 | 1 | < 0.1% |
| 1990000000 | 1 | < 0.1% |
| 1945382704 | 1 | < 0.1% |
| 1922580000 | 4 | < 0.1% |
| 1904032704 | 1 | < 0.1% |
| Distinct | 1075 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 350.0 MiB |
| 2020-03-27 | 44764 |
|---|---|
| 2019-02-28 | 22362 |
| 2020-02-01 | 21815 |
| 2018-09-18 | 21696 |
| 2018-12-31 | 21235 |
| Other values (1070) |
Length
| Max length | 10 |
|---|---|
| Median length | 10 |
| Mean length | 10 |
| Min length | 10 |
Characters and Unicode
| Total characters | 54770060 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 53 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | 2018-02-19 |
|---|---|
| 2nd row | 2018-02-27 |
| 3rd row | 2018-02-28 |
| 4th row | 2018-03-01 |
| 5th row | 2018-03-01 |
Common Values
| Value | Count | Frequency (%) |
| 2020-03-27 | 44764 | 0.8% |
| 2019-02-28 | 22362 | 0.4% |
| 2020-02-01 | 21815 | 0.4% |
| 2018-09-18 | 21696 | 0.4% |
| 2018-12-31 | 21235 | 0.4% |
| 2020-09-01 | 21132 | 0.4% |
| 2021-04-27 | 20179 | 0.4% |
| 2021-04-30 | 20147 | 0.4% |
| 2019-04-01 | 19759 | 0.4% |
| 2020-10-01 | 18686 | 0.3% |
| Other values (1065) | 5245231 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 2020-03-27 | 44764 | 0.8% |
| 2019-02-28 | 22362 | 0.4% |
| 2020-02-01 | 21815 | 0.4% |
| 2018-09-18 | 21696 | 0.4% |
| 2018-12-31 | 21235 | 0.4% |
| 2020-09-01 | 21132 | 0.4% |
| 2021-04-27 | 20179 | 0.4% |
| 2021-04-30 | 20147 | 0.4% |
| 2019-04-01 | 19759 | 0.4% |
| 2020-10-01 | 18686 | 0.3% |
| Other values (1065) | 5245231 |
Most occurring characters
| Value | Count | Frequency (%) |
| 0 | 13884262 | |
| 2 | 11188298 | |
| - | 10954012 | |
| 1 | 8638985 | |
| 9 | 3308723 | 6.0% |
| 8 | 1672331 | 3.1% |
| 3 | 1459531 | 2.7% |
| 4 | 1043895 | 1.9% |
| 7 | 947096 | 1.7% |
| 6 | 858651 | 1.6% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 43816048 | |
| Dash Punctuation | 10954012 | 20.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 13884262 | |
| 2 | 11188298 | |
| 1 | 8638985 | |
| 9 | 3308723 | 7.6% |
| 8 | 1672331 | 3.8% |
| 3 | 1459531 | 3.3% |
| 4 | 1043895 | 2.4% |
| 7 | 947096 | 2.2% |
| 6 | 858651 | 2.0% |
| 5 | 814276 | 1.9% |
Dash Punctuation
| Value | Count | Frequency (%) |
| - | 10954012 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 54770060 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 0 | 13884262 | |
| 2 | 11188298 | |
| - | 10954012 | |
| 1 | 8638985 | |
| 9 | 3308723 | 6.0% |
| 8 | 1672331 | 3.1% |
| 3 | 1459531 | 2.7% |
| 4 | 1043895 | 1.9% |
| 7 | 947096 | 1.7% |
| 6 | 858651 | 1.6% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 54770060 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 0 | 13884262 | |
| 2 | 11188298 | |
| - | 10954012 | |
| 1 | 8638985 | |
| 9 | 3308723 | 6.0% |
| 8 | 1672331 | 3.1% |
| 3 | 1459531 | 2.7% |
| 4 | 1043895 | 1.9% |
| 7 | 947096 | 1.7% |
| 6 | 858651 | 1.6% |
| Distinct | 86400 |
|---|---|
| Distinct (%) | 1.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 339.5 MiB |
| 16:15:49 | 197 |
|---|---|
| 06:32:22 | 190 |
| 06:36:17 | 189 |
| 06:32:06 | 186 |
| 06:32:15 | 184 |
| Other values (86395) |
Length
| Max length | 8 |
|---|---|
| Median length | 8 |
| Mean length | 8 |
| Min length | 8 |
Characters and Unicode
| Total characters | 43816048 |
|---|---|
| Distinct characters | 11 |
| Distinct categories | 2 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 20:00:21 |
|---|---|
| 2nd row | 12:04:54 |
| 3rd row | 15:44:00 |
| 4th row | 11:24:52 |
| 5th row | 17:42:43 |
Common Values
| Value | Count | Frequency (%) |
| 16:15:49 | 197 | < 0.1% |
| 06:32:22 | 190 | < 0.1% |
| 06:36:17 | 189 | < 0.1% |
| 06:32:06 | 186 | < 0.1% |
| 06:32:15 | 184 | < 0.1% |
| 06:32:23 | 184 | < 0.1% |
| 06:36:02 | 183 | < 0.1% |
| 06:32:12 | 183 | < 0.1% |
| 06:36:15 | 183 | < 0.1% |
| 06:33:50 | 182 | < 0.1% |
| Other values (86390) | 5475145 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| 16:15:49 | 197 | < 0.1% |
| 06:32:22 | 190 | < 0.1% |
| 06:36:17 | 189 | < 0.1% |
| 06:32:06 | 186 | < 0.1% |
| 06:32:15 | 184 | < 0.1% |
| 06:32:23 | 184 | < 0.1% |
| 06:36:02 | 183 | < 0.1% |
| 06:32:12 | 183 | < 0.1% |
| 06:36:15 | 183 | < 0.1% |
| 06:33:50 | 182 | < 0.1% |
| Other values (86390) | 5475145 |
Most occurring characters
| Value | Count | Frequency (%) |
| : | 10954012 | |
| 1 | 6413260 | |
| 0 | 5459482 | |
| 2 | 4038374 | 9.2% |
| 3 | 3554841 | 8.1% |
| 4 | 3443261 | 7.9% |
| 5 | 3386306 | 7.7% |
| 6 | 1758466 | 4.0% |
| 8 | 1614860 | 3.7% |
| 7 | 1610957 | 3.7% |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 32862036 | |
| Other Punctuation | 10954012 | 25.0% |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 6413260 | |
| 0 | 5459482 | |
| 2 | 4038374 | |
| 3 | 3554841 | |
| 4 | 3443261 | |
| 5 | 3386306 | |
| 6 | 1758466 | 5.4% |
| 8 | 1614860 | 4.9% |
| 7 | 1610957 | 4.9% |
| 9 | 1582229 | 4.8% |
Other Punctuation
| Value | Count | Frequency (%) |
| : | 10954012 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 43816048 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| : | 10954012 | |
| 1 | 6413260 | |
| 0 | 5459482 | |
| 2 | 4038374 | 9.2% |
| 3 | 3554841 | 8.1% |
| 4 | 3443261 | 7.9% |
| 5 | 3386306 | 7.7% |
| 6 | 1758466 | 4.0% |
| 8 | 1614860 | 3.7% |
| 7 | 1610957 | 3.7% |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 43816048 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| : | 10954012 | |
| 1 | 6413260 | |
| 0 | 5459482 | |
| 2 | 4038374 | 9.2% |
| 3 | 3554841 | 8.1% |
| 4 | 3443261 | 7.9% |
| 5 | 3386306 | 7.7% |
| 6 | 1758466 | 4.0% |
| 8 | 1614860 | 3.7% |
| 7 | 1610957 | 3.7% |
| Distinct | 448318 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 54.03826356 |
| Minimum | 41.4590611 |
|---|---|
| Maximum | 71.9803994 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 41.4590611 |
|---|---|
| 5-th percentile | 44.8950433 |
| Q1 | 53.3776756 |
| median | 55.171385 |
| Q3 | 56.2261305 |
| 95-th percentile | 59.96686658 |
| Maximum | 71.9803994 |
| Range | 30.5213383 |
| Interquartile range (IQR) | 2.8484549 |
Descriptive statistics
| Standard deviation | 4.622757917 |
|---|---|
| Coefficient of variation (CV) | 0.08554601153 |
| Kurtosis | 0.1878352641 |
| Mean | 54.03826356 |
| Median Absolute Deviation (MAD) | 1.1407398 |
| Skewness | -0.981604419 |
| Sum | 295967893.7 |
| Variance | 21.36989076 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 55.0303931 | 106177 | 1.9% |
| 55.014108 | 29577 | 0.5% |
| 55.0128684 | 26030 | 0.5% |
| 54.9471407 | 21055 | 0.4% |
| 59.939084 | 20873 | 0.4% |
| 55.017756 | 18885 | 0.3% |
| 55.0176723 | 17999 | 0.3% |
| 55.0139939 | 16518 | 0.3% |
| 55.0128402 | 15983 | 0.3% |
| 55.7943584 | 13626 | 0.2% |
| Other values (448308) | 5190283 |
| Value | Count | Frequency (%) |
| 41.4590611 | 2 | < 0.1% |
| 41.459089 | 40 | |
| 41.6168777 | 1 | < 0.1% |
| 41.6175281 | 39 | |
| 41.6201334 | 2 | < 0.1% |
| 41.6740972 | 1 | < 0.1% |
| 41.6748171 | 1 | < 0.1% |
| 41.677346 | 2 | < 0.1% |
| 41.677411 | 1 | < 0.1% |
| 41.6922439 | 3 | < 0.1% |
| Value | Count | Frequency (%) |
| 71.9803994 | 1 | < 0.1% |
| 71.6389699 | 1 | < 0.1% |
| 71.6362512 | 1 | < 0.1% |
| 71.634255 | 1 | < 0.1% |
| 70.6206033 | 1 | < 0.1% |
| 69.6367371 | 1 | < 0.1% |
| 69.4994027 | 2 | < 0.1% |
| 69.4987091 | 3 | |
| 69.4984929 | 1 | < 0.1% |
| 69.4984049 | 5 |
| Distinct | 449701 |
|---|---|
| Distinct (%) | 8.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 53.24433248 |
| Minimum | 19.890196 |
|---|---|
| Maximum | 162.5360775 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 19.890196 |
|---|---|
| 5-th percentile | 30.318554 |
| Q1 | 37.777895 |
| median | 43.067741 |
| Q3 | 65.6489496 |
| 95-th percentile | 83.6773749 |
| Maximum | 162.5360775 |
| Range | 142.6458815 |
| Interquartile range (IQR) | 27.8710546 |
Descriptive statistics
| Standard deviation | 20.74762836 |
|---|---|
| Coefficient of variation (CV) | 0.3896682969 |
| Kurtosis | -0.3778285941 |
| Mean | 53.24433248 |
| Median Absolute Deviation (MAD) | 8.7025264 |
| Skewness | 0.8397470317 |
| Sum | 291619528.4 |
| Variance | 430.4640825 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 83.0155452 | 106177 | 1.9% |
| 83.0016615 | 29577 | 0.5% |
| 82.9999987 | 26030 | 0.5% |
| 82.9585961 | 21055 | 0.4% |
| 30.315879 | 20873 | 0.4% |
| 83.003578 | 18886 | 0.3% |
| 83.003522 | 17999 | 0.3% |
| 83.0033194 | 16518 | 0.3% |
| 83.0018862 | 15983 | 0.3% |
| 49.1114975 | 13626 | 0.2% |
| Other values (449691) | 5190282 |
| Value | Count | Frequency (%) |
| 19.890196 | 2 | |
| 19.9030946 | 4 | |
| 19.9039311 | 1 | < 0.1% |
| 19.9046331 | 1 | < 0.1% |
| 19.90544 | 1 | < 0.1% |
| 19.906382 | 1 | < 0.1% |
| 19.9064655 | 1 | < 0.1% |
| 19.9067518 | 1 | < 0.1% |
| 19.9082648 | 2 | |
| 19.913043 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 162.5360775 | 1 | |
| 161.3329936 | 1 | |
| 161.3285973 | 1 | |
| 161.3278436 | 1 | |
| 161.3244357 | 1 | |
| 159.8367187 | 1 | |
| 158.7133286 | 1 | |
| 158.7126556 | 2 | |
| 158.7103041 | 1 | |
| 158.6990352 | 1 |
| Distinct | 84 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 4307.140936 |
| Minimum | 3 |
|---|---|
| Maximum | 61888 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 3 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 2661 |
| median | 2922 |
| Q3 | 6171 |
| 95-th percentile | 9654 |
| Maximum | 61888 |
| Range | 61885 |
| Interquartile range (IQR) | 3510 |
Descriptive statistics
| Standard deviation | 3308.050175 |
|---|---|
| Coefficient of variation (CV) | 0.7680385258 |
| Kurtosis | -0.7214476444 |
| Mean | 4307.140936 |
| Median Absolute Deviation (MAD) | 2360 |
| Skewness | 0.5898172921 |
| Sum | 2.359023675 × 1010 |
| Variance | 10943195.96 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 9654 | 1049435 | |
| 2843 | 637224 | |
| 81 | 500368 | 9.1% |
| 2661 | 461820 | 8.4% |
| 3 | 439511 | 8.0% |
| 6171 | 237289 | 4.3% |
| 2922 | 230545 | 4.2% |
| 3230 | 222652 | 4.1% |
| 5282 | 155645 | 2.8% |
| 3991 | 141633 | 2.6% |
| Other values (74) | 1400884 |
| Value | Count | Frequency (%) |
| 3 | 439511 | |
| 69 | 77 | < 0.1% |
| 81 | 500368 | |
| 821 | 2519 | < 0.1% |
| 1010 | 48396 | 0.9% |
| 1491 | 3857 | 0.1% |
| 1901 | 12 | < 0.1% |
| 2072 | 63128 | 1.2% |
| 2328 | 8160 | 0.1% |
| 2359 | 22216 | 0.4% |
| Value | Count | Frequency (%) |
| 61888 | 5 | < 0.1% |
| 16705 | 139 | < 0.1% |
| 14880 | 357 | < 0.1% |
| 14368 | 593 | < 0.1% |
| 13919 | 9913 | |
| 13913 | 735 | < 0.1% |
| 13098 | 256 | < 0.1% |
| 11991 | 6382 | |
| 11416 | 5243 | |
| 11171 | 11654 |
| Distinct | 6 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.948966278 |
| Minimum | 0 |
|---|---|
| Maximum | 5 |
| Zeros | 307165 |
| Zeros (%) | 5.6% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 3 |
| 95-th percentile | 3 |
| Maximum | 5 |
| Range | 5 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 1.038536638 |
|---|---|
| Coefficient of variation (CV) | 0.5328653703 |
| Kurtosis | -0.9864830136 |
| Mean | 1.948966278 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 0.03600047645 |
| Sum | 10674500 |
| Variance | 1.078558348 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=6)
| Value | Count | Frequency (%) |
| 1 | 1955661 | |
| 3 | 1892756 | |
| 2 | 1130731 | |
| 0 | 307165 | 5.6% |
| 4 | 174356 | 3.2% |
| 5 | 16337 | 0.3% |
| Value | Count | Frequency (%) |
| 0 | 307165 | 5.6% |
| 1 | 1955661 | |
| 2 | 1130731 | |
| 3 | 1892756 | |
| 4 | 174356 | 3.2% |
| 5 | 16337 | 0.3% |
| Value | Count | Frequency (%) |
| 5 | 16337 | 0.3% |
| 4 | 174356 | 3.2% |
| 3 | 1892756 | |
| 2 | 1130731 | |
| 1 | 1955661 | |
| 0 | 307165 | 5.6% |
| Distinct | 39 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.214529982 |
| Minimum | 1 |
|---|---|
| Maximum | 39 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 2 |
| median | 5 |
| Q3 | 9 |
| 95-th percentile | 16 |
| Maximum | 39 |
| Range | 38 |
| Interquartile range (IQR) | 7 |
Descriptive statistics
| Standard deviation | 4.957419277 |
|---|---|
| Coefficient of variation (CV) | 0.7977142746 |
| Kurtosis | 2.208892357 |
| Mean | 6.214529982 |
| Median Absolute Deviation (MAD) | 3 |
| Skewness | 1.42927485 |
| Sum | 34037018 |
| Variance | 24.57600589 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=39)
| Value | Count | Frequency (%) |
| 2 | 707923 | |
| 1 | 677583 | |
| 3 | 610523 | |
| 5 | 572040 | |
| 4 | 565548 | |
| 6 | 331771 | 6.1% |
| 8 | 306960 | 5.6% |
| 7 | 301103 | 5.5% |
| 9 | 297467 | 5.4% |
| 10 | 232850 | 4.3% |
| Other values (29) | 873238 |
| Value | Count | Frequency (%) |
| 1 | 677583 | |
| 2 | 707923 | |
| 3 | 610523 | |
| 4 | 565548 | |
| 5 | 572040 | |
| 6 | 331771 | |
| 7 | 301103 | |
| 8 | 306960 | |
| 9 | 297467 | |
| 10 | 232850 | 4.3% |
| Value | Count | Frequency (%) |
| 39 | 28 | < 0.1% |
| 38 | 40 | < 0.1% |
| 37 | 105 | < 0.1% |
| 36 | 151 | < 0.1% |
| 35 | 136 | < 0.1% |
| 34 | 262 | < 0.1% |
| 33 | 772 | |
| 32 | 1102 | |
| 31 | 1162 | |
| 30 | 1417 |
| Distinct | 39 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 11.39892032 |
| Minimum | 1 |
|---|---|
| Maximum | 39 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 5 |
| median | 10 |
| Q3 | 16 |
| 95-th percentile | 25 |
| Maximum | 39 |
| Range | 38 |
| Interquartile range (IQR) | 11 |
Descriptive statistics
| Standard deviation | 6.535733868 |
|---|---|
| Coefficient of variation (CV) | 0.573364291 |
| Kurtosis | 0.148753319 |
| Mean | 11.39892032 |
| Median Absolute Deviation (MAD) | 5 |
| Skewness | 0.8296565373 |
| Sum | 62431955 |
| Variance | 42.7158172 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=39)
| Value | Count | Frequency (%) |
| 5 | 1002943 | |
| 10 | 941125 | |
| 9 | 734812 | |
| 17 | 351218 | 6.4% |
| 16 | 308093 | 5.6% |
| 25 | 215201 | 3.9% |
| 12 | 166984 | 3.0% |
| 3 | 164814 | 3.0% |
| 4 | 162246 | 3.0% |
| 18 | 152007 | 2.8% |
| Other values (29) | 1277563 |
| Value | Count | Frequency (%) |
| 1 | 23722 | 0.4% |
| 2 | 117822 | 2.2% |
| 3 | 164814 | 3.0% |
| 4 | 162246 | 3.0% |
| 5 | 1002943 | |
| 6 | 109277 | 2.0% |
| 7 | 63092 | 1.2% |
| 8 | 76137 | 1.4% |
| 9 | 734812 | |
| 10 | 941125 |
| Value | Count | Frequency (%) |
| 39 | 1970 | < 0.1% |
| 38 | 520 | < 0.1% |
| 37 | 1904 | < 0.1% |
| 36 | 1319 | < 0.1% |
| 35 | 1961 | < 0.1% |
| 34 | 916 | < 0.1% |
| 33 | 17158 | |
| 32 | 8272 | |
| 31 | 4538 | 0.1% |
| 30 | 5859 | 0.1% |
| Distinct | 12 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.726173387 |
| Minimum | -2 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 306552 |
| Negative (%) | 5.6% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | -2 |
|---|---|
| 5-th percentile | -1 |
| Q1 | 1 |
| median | 2 |
| Q3 | 2 |
| 95-th percentile | 3 |
| Maximum | 10 |
| Range | 12 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1.082132738 |
|---|---|
| Coefficient of variation (CV) | 0.6268968956 |
| Kurtosis | 0.9955762546 |
| Mean | 1.726173387 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | -0.2318678244 |
| Sum | 9454262 |
| Variance | 1.171011262 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=12)
| Value | Count | Frequency (%) |
| 1 | 2067013 | |
| 2 | 1827514 | |
| 3 | 1097354 | |
| -1 | 306209 | 5.6% |
| 4 | 152160 | 2.8% |
| 5 | 22576 | 0.4% |
| 6 | 2357 | < 0.1% |
| 7 | 788 | < 0.1% |
| 8 | 353 | < 0.1% |
| -2 | 343 | < 0.1% |
| Other values (2) | 339 | < 0.1% |
| Value | Count | Frequency (%) |
| -2 | 343 | < 0.1% |
| -1 | 306209 | 5.6% |
| 1 | 2067013 | |
| 2 | 1827514 | |
| 3 | 1097354 | |
| 4 | 152160 | 2.8% |
| 5 | 22576 | 0.4% |
| 6 | 2357 | < 0.1% |
| 7 | 788 | < 0.1% |
| 8 | 353 | < 0.1% |
| Value | Count | Frequency (%) |
| 10 | 1 | < 0.1% |
| 9 | 338 | < 0.1% |
| 8 | 353 | < 0.1% |
| 7 | 788 | < 0.1% |
| 6 | 2357 | < 0.1% |
| 5 | 22576 | 0.4% |
| 4 | 152160 | 2.8% |
| 3 | 1097354 | |
| 2 | 1827514 | |
| 1 | 2067013 |
| Distinct | 12741 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 53.91824855 |
| Minimum | 0.07 |
|---|---|
| Maximum | 7856 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 0.07 |
|---|---|
| 5-th percentile | 29 |
| Q1 | 38 |
| median | 48.02 |
| Q3 | 63.13 |
| 95-th percentile | 93 |
| Maximum | 7856 |
| Range | 7855.93 |
| Interquartile range (IQR) | 25.13 |
Descriptive statistics
| Standard deviation | 33.352926 |
|---|---|
| Coefficient of variation (CV) | 0.6185832606 |
| Kurtosis | 8492.648526 |
| Mean | 53.91824855 |
| Median Absolute Deviation (MAD) | 11.98 |
| Skewness | 57.05613875 |
| Sum | 295310570.8 |
| Variance | 1112.417673 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 40 | 96602 | 1.8% |
| 45 | 95018 | 1.7% |
| 42 | 94570 | 1.7% |
| 44 | 94073 | 1.7% |
| 43 | 91727 | 1.7% |
| 60 | 88074 | 1.6% |
| 30 | 77211 | 1.4% |
| 38 | 75534 | 1.4% |
| 33 | 73824 | 1.3% |
| 32 | 72387 | 1.3% |
| Other values (12731) | 4617986 |
| Value | Count | Frequency (%) |
| 0.07 | 1 | |
| 0.22 | 1 | |
| 0.28 | 1 | |
| 0.32 | 1 | |
| 0.45 | 1 | |
| 0.48 | 2 | |
| 0.52 | 1 | |
| 0.53 | 2 | |
| 0.61 | 1 | |
| 0.64 | 1 |
| Value | Count | Frequency (%) |
| 7856 | 1 | |
| 7660 | 1 | |
| 7625 | 1 | |
| 7513.4 | 1 | |
| 7190 | 1 | |
| 6812.6 | 1 | |
| 6580 | 1 | |
| 5985 | 1 | |
| 5711.6 | 1 | |
| 5644 | 1 |
| Distinct | 4154 |
|---|---|
| Distinct (%) | 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 10.62839745 |
| Minimum | 0.01 |
|---|---|
| Maximum | 9999 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 41.8 MiB |
Quantile statistics
| Minimum | 0.01 |
|---|---|
| 5-th percentile | 5 |
| Q1 | 7 |
| median | 9.7 |
| Q3 | 12.7 |
| 95-th percentile | 20 |
| Maximum | 9999 |
| Range | 9998.99 |
| Interquartile range (IQR) | 5.7 |
Descriptive statistics
| Standard deviation | 9.792379875 |
|---|---|
| Coefficient of variation (CV) | 0.9213411448 |
| Kurtosis | 371626.0466 |
| Mean | 10.62839745 |
| Median Absolute Deviation (MAD) | 2.7 |
| Skewness | 452.5307552 |
| Sum | 58211796.61 |
| Variance | 95.89070361 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 6 | 520623 | 9.5% |
| 9 | 449543 | 8.2% |
| 10 | 369245 | 6.7% |
| 8 | 310615 | 5.7% |
| 12 | 288100 | 5.3% |
| 7 | 259294 | 4.7% |
| 5 | 256190 | 4.7% |
| 11 | 209125 | 3.8% |
| 14 | 158354 | 2.9% |
| 13 | 124148 | 2.3% |
| Other values (4144) | 2531769 |
| Value | Count | Frequency (%) |
| 0.01 | 12 | < 0.1% |
| 0.02 | 2 | < 0.1% |
| 0.03 | 3 | < 0.1% |
| 0.04 | 3 | < 0.1% |
| 0.05 | 27 | < 0.1% |
| 0.06 | 168 | |
| 0.07 | 71 | < 0.1% |
| 0.08 | 58 | < 0.1% |
| 0.09 | 209 | |
| 0.1 | 111 |
| Value | Count | Frequency (%) |
| 9999 | 1 | |
| 8235 | 1 | |
| 6500 | 1 | |
| 6270 | 1 | |
| 4949 | 1 | |
| 3000 | 2 | |
| 2500 | 1 | |
| 2110 | 1 | |
| 1958 | 1 | |
| 1500 | 1 |
object_type
Categorical
| Distinct | 2 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 304.5 MiB |
| 1 | |
|---|---|
| 11 |
Length
| Max length | 2 |
|---|---|
| Median length | 1 |
| Mean length | 1.294539937 |
| Min length | 1 |
Characters and Unicode
| Total characters | 7090203 |
|---|---|
| Distinct characters | 1 |
| Distinct categories | 1 ? |
| Distinct scripts | 1 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | 1 |
|---|---|
| 2nd row | 1 |
| 3rd row | 1 |
| 4th row | 11 |
| 5th row | 1 |
Common Values
| Value | Count | Frequency (%) |
| 1 | 3863809 | |
| 11 | 1613197 |
Length
Histogram of lengths of the category
Pie chart
| Value | Count | Frequency (%) |
| 1 | 3863809 | |
| 11 | 1613197 |
Most occurring characters
| Value | Count | Frequency (%) |
| 1 | 7090203 |
Most occurring categories
| Value | Count | Frequency (%) |
| Decimal Number | 7090203 |
Most frequent character per category
Decimal Number
| Value | Count | Frequency (%) |
| 1 | 7090203 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Common | 7090203 |
Most frequent character per script
Common
| Value | Count | Frequency (%) |
| 1 | 7090203 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 7090203 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| 1 | 7090203 |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| price | date | time | geo_lat | geo_lon | region | building_type | level | levels | rooms | area | kitchen_area | object_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 6050000 | 2018-02-19 | 20:00:21 | 59.805808 | 30.376141 | 2661 | 1 | 8 | 10 | 3 | 82.6 | 10.8 | 1 |
| 1 | 8650000 | 2018-02-27 | 12:04:54 | 55.683807 | 37.297405 | 81 | 3 | 5 | 24 | 2 | 69.1 | 12.0 | 1 |
| 2 | 4000000 | 2018-02-28 | 15:44:00 | 56.295250 | 44.061637 | 2871 | 1 | 5 | 9 | 3 | 66.0 | 10.0 | 1 |
| 3 | 1850000 | 2018-03-01 | 11:24:52 | 44.996132 | 39.074783 | 2843 | 4 | 12 | 16 | 2 | 38.0 | 5.0 | 11 |
| 4 | 5450000 | 2018-03-01 | 17:42:43 | 55.918767 | 37.984642 | 81 | 3 | 13 | 14 | 2 | 60.0 | 10.0 | 1 |
| 5 | 3300000 | 2018-03-02 | 21:18:42 | 55.908253 | 37.726448 | 81 | 1 | 4 | 5 | 1 | 32.0 | 6.0 | 1 |
| 6 | 4704280 | 2018-03-04 | 12:35:25 | 55.621097 | 37.431002 | 3 | 2 | 1 | 25 | 1 | 31.7 | 6.0 | 11 |
| 7 | 3600000 | 2018-03-04 | 20:52:38 | 59.875526 | 30.395457 | 2661 | 1 | 2 | 5 | 1 | 31.1 | 6.0 | 1 |
| 8 | 3390000 | 2018-03-05 | 07:07:05 | 53.195031 | 50.106952 | 3106 | 2 | 4 | 24 | 2 | 64.0 | 13.0 | 11 |
| 9 | 2800000 | 2018-03-06 | 09:57:10 | 55.736972 | 38.846457 | 81 | 1 | 9 | 10 | 2 | 55.0 | 8.0 | 1 |
Last rows
| price | date | time | geo_lat | geo_lon | region | building_type | level | levels | rooms | area | kitchen_area | object_type | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5476996 | 6400000 | 2021-05-01 | 20:13:41 | 55.904292 | 37.984368 | 81 | 3 | 4 | 17 | 3 | 82.0 | 10.6 | 1 |
| 5476997 | 7200000 | 2021-05-01 | 20:13:42 | 59.772947 | 30.056530 | 3446 | 2 | 2 | 3 | 2 | 59.0 | 22.3 | 11 |
| 5476998 | 4900000 | 2021-05-01 | 20:13:43 | 59.850103 | 30.357299 | 2661 | 1 | 2 | 5 | 1 | 31.0 | 6.0 | 1 |
| 5476999 | 12850000 | 2021-05-01 | 20:13:47 | 55.701280 | 37.642654 | 3 | 2 | 12 | 24 | 1 | 41.0 | 9.0 | 1 |
| 5477000 | 9000000 | 2021-05-01 | 20:13:48 | 44.051357 | 42.867573 | 2900 | 3 | 4 | 5 | 4 | 178.0 | 20.0 | 1 |
| 5477001 | 19739760 | 2021-05-01 | 20:13:58 | 55.804736 | 37.750898 | 3 | 1 | 8 | 17 | 4 | 93.2 | 13.8 | 11 |
| 5477002 | 12503160 | 2021-05-01 | 20:14:01 | 55.841415 | 37.489624 | 3 | 2 | 17 | 32 | 2 | 45.9 | 6.6 | 11 |
| 5477003 | 8800000 | 2021-05-01 | 20:14:04 | 56.283909 | 44.075408 | 2871 | 2 | 4 | 17 | 3 | 86.5 | 11.8 | 1 |
| 5477004 | 11831910 | 2021-05-01 | 20:14:12 | 55.804736 | 37.750898 | 3 | 1 | 8 | 33 | 2 | 52.1 | 18.9 | 11 |
| 5477005 | 13316200 | 2021-05-01 | 20:14:15 | 55.860240 | 37.540356 | 3 | 2 | 10 | 23 | 2 | 55.6 | 20.8 | 11 |
Most frequently occurring
| price | date | time | geo_lat | geo_lon | region | building_type | level | levels | rooms | area | kitchen_area | object_type | # duplicates | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 822 | 4165000 | 2019-08-01 | 01:06:29 | 55.829360 | 37.814968 | 3 | 1 | 5 | 14 | 1 | 39.1 | 8.7 | 1 | 5 |
| 1284 | 15651445 | 2020-03-12 | 07:34:04 | 55.880945 | 37.541180 | 3 | 2 | 15 | 24 | 3 | 88.3 | 13.2 | 11 | 5 |
| 1332 | 17957527 | 2020-03-12 | 11:20:50 | 55.881585 | 37.683930 | 3 | 2 | 13 | 24 | 2 | 89.9 | 19.8 | 11 | 5 |
| 1367 | 19272327 | 2020-03-12 | 10:08:30 | 55.881585 | 37.683930 | 3 | 2 | 17 | 24 | 3 | 94.7 | 16.0 | 11 | 5 |
| 1370 | 19315039 | 2020-03-12 | 07:54:36 | 55.881585 | 37.683930 | 3 | 2 | 16 | 24 | 3 | 95.4 | 17.9 | 11 | 5 |
| 1372 | 19470003 | 2020-03-12 | 07:54:36 | 55.881585 | 37.683930 | 3 | 2 | 17 | 24 | 3 | 95.4 | 17.9 | 11 | 5 |
| 1386 | 20504389 | 2020-03-12 | 11:01:07 | 55.881585 | 37.683930 | 3 | 2 | 18 | 24 | 3 | 103.9 | 12.4 | 11 | 5 |
| 307 | 2173600 | 2019-09-11 | 16:58:10 | 60.057175 | 30.266266 | 2661 | 2 | 11 | 25 | 1 | 20.9 | 5.0 | 11 | 4 |
| 951 | 4949880 | 2020-01-22 | 14:43:44 | 55.770856 | 37.517985 | 3 | 2 | 13 | 22 | 1 | 24.7 | 3.0 | 11 | 4 |
| 1107 | 7935809 | 2020-03-12 | 07:54:37 | 55.881585 | 37.683930 | 3 | 2 | 6 | 24 | 1 | 37.5 | 13.1 | 11 | 4 |